# A 1 Tb/s 3 W Inductive-Coupling Transceiver for 3D-Stacked Inter-Chip Clock and Data Link

Noriyuki Miura, Daisuke Mizoguchi, Mari Inoue, Kiichi Niitsu, Yoshihiro Nakagawa, Masamoto Tago, Muneo Fukaishi, *Member, IEEE*, Takayasu Sakurai, *Fellow, IEEE*, and Tadahiro Kuroda, *Fellow, IEEE* 

Abstract—A 1 Tb/s 3 W inter-chip transceiver transmits clock and data by inductive coupling at a clock rate of 1 GHz and data rate of 1 Gb/s per channel. 1024 data transceivers are arranged with a pitch of 30  $\mu \rm m$  in a layout area of 1 mm². The total layout area including 16 clock transceivers is 2 mm² in 0.18  $\mu \rm m$  CMOS and the chip thickness is reduced to 10  $\mu \rm m$ . Bi-phase modulation (BPM) is employed for the data link to improve noise immunity, reducing power in the transceiver. Four-phase time division multiple access (TDMA) reduces crosstalk and the bit-error rate (BER) is lower than  $10^{-13}$ .

Index Terms—Inductor, SiP, three-dimensional, wireless interconnect.

#### I. INTRODUCTION

THE performance gap between computation in a chip and communication between chips is increasing, making inter-chip communication a bottleneck in development of high-performance LSI systems. One approach to realize high-speed interfaces is to shorten the chip-to-chip distance. System in Package (SiP) reduces the chip-to-chip distance significantly by thinning chips and stacking chips on each other in a package, which provides strong motivation to develop high-speed, low-power, and high-density interface between three-dimensionally stacked chips.

Several 3-D interface technologies have been investigated [1]–[4]. Mechanical wired approaches, such as a microbump or through-Si via (TSV) are employed in [1], [2]. [3], [4] utilizes electrical wireless approaches based on capacitive coupling or inductive coupling. We have developed an inductive-coupling interface [4]–[7] where chips are stacked and inductively coupled by on-chip metal inductors. A transmitter changes current in the metal inductor and a receiver samples induced voltage through inductive coupling and then recovers data. The inductive-coupling interface has many advantages over the wired interface. 1) Cost is lower since the interface (metal inductor) can be implemented in a standard LSI process while the wired interface requires an additional mechanical process

Manuscript received April 17, 2006; revised July 28, 2006.

Digital Object Identifier 10.1109/JSSC.2006.886554

for fabrication. 2) Scaling is easier since the inductive-coupling interface can remove a scaling limitation due to the mechanical process in the wired approach. The inductive coupling interface is scaled down by shortening a vertical distance that can be reduced down to several micron meters in face-to-face stacked chips. 3) Reliability is higher. The inductive-coupling interface is non-contact scheme and chips are detachable. By using the interface as a test head, individual chips before assembly can be tested without damaging any chips. Power for the chips under the test can also be transferred through the inductive coupling. Even if power transfer efficiency is low, the chips can operate since the tester can transmit large power. 4) Area-consuming and highly-capacitive ESD protection devices can be removed due to the non-contact scheme. 5) The inductive-coupling interface can communicate through circuits. Transceiver circuits can be placed under the metal inductor to save layout area. Indeed, the transceiver circuits are placed under the metal inductor in this work. In addition, the inductive-coupling interface overcomes some limitations of the capacitive-coupling interface since it enables over three-stacked inter-chip communications as reported in [6] while the capacitive-coupling interface is employed to only two chips stacked face-to-face [3], [8]-[10]. Since chips can be stacked face-up, power and ground can be provided by bonding wires in a low-power application such as mobile phones or digital cameras. If one of the chips consumes higher power, it can be placed at the bottom and stacked face-down to an area-bump package. For high-performance and scaled systems, TSV may be necessary to provide power through all stacked chips. However, advanced fine-pitch TSV and at-speed testing are not required just for DC connections so that the cost and KGD problems do not occur.

At ISSCC 2005, a 195 Gb/s 1.2 W inductive-coupling transceiver has been reported [6] where 195ch transceivers are arranged with a pitch of 50  $\mu$ m. The inductive-coupling transceiver communicates at 195 Gb/s with power efficiency of 6 mW/Gb/s (6 pJ/b) and area efficiency of 2.5 mm²/Tb/s. In this work, the state-of-the-art inductive-coupling transceiver is presented for over-terabit/s data communication with higher power and area efficiency. Clock is also provided by the inductive coupling for the first time.

# II. INDUCTIVE-COUPLING TRANSCEIVER

Fig. 1 presents the block diagram of the inductive-coupling transceiver. The transceiver comprises 16 slices of a 64ch block, yielding 1024ch data transceivers in total. Each of 64ch blocks consists of 64 data transceivers and one clock transceiver. Clock for the transmitter Txclk is transmitted through the inductive coupling and clock for the receiver Rxclk is recovered by the

N. Miura, D. Mizoguchi, M. Inoue, K. Niitsu, and T. Kuroda are with the Department of Electronics and Electrical Engineering, Keio University, Yokohama, Kanagawa 223-8522, Japan (e-mail: miura@kuro.elec.keio.ac.jp).

Y. Nakagawa and M. Fukaishi are with the System Device Research Laboratories, NEC Corporation, Sagamihara, Kanagawa 229-1198, Japan.

M. Tago is with the Jisso and Production Technologies Research Laboratories, NEC Corporation, Sagamihara, Kanagawa 229-1198, Japan.

T. Sakurai is with the Center for Collaborative Research, University of Tokyo, Meguro-ku, Tokyo 153-8505, Japan.



Fig. 1. Block diagram of inductive-coupling transceiver.



Fig. 2. Characteristics of transmit current  $I_T$  and received voltage  $V_R$  in data link in (a) time and (b) frequency domain.

clock transceiver. The clock frequency is 1 GHz. The phase interpolator (PI) generates 4 time slots in one clock cycle by creating four-phase clocks from both Txclk and Rxclk for time division multiple access (TDMA). Data transceivers are divided into the time slots to reduce crosstalk. Each data transceiver communicates at 1 Gb/s/channel. 1 Tb/s data bandwidth is obtained by 1024 parallel data links.

# A. Inductive Coupling for Data Link

Fig. 2(a) illustrates time-domain waveforms of transmit current  $I_T$  and received voltage  $V_R$  in the data link with an ideal inductive coupling.  $I_T$  is approximately given by Gaussian-pulse current

$$I_T(t) = I_{pp} \exp\left[-\frac{4(t-t_0)^2}{\tau^2}\right]$$
 (1)

where  $I_{pp}$  is peak current,  $t_0$  is time offset and  $\tau$  is pulse width. The ideal inductive coupling functions as an ideal differentiator. Thus,  $I_T$  induces Gaussian monocycle for  $V_R$ 

(=  $j\omega MI_T$  where M is mutual inductance of the inductive coupling). The data receiver samples the latter half of the  $V_R$  signal. A sampling-time margin is determined by the pulsewidth  $\tau$ .  $\tau$  is set to 125 ps since each transceiver operates in 250 ps-time slot at 1 Gb/s with four-phase TDMA.

Fig. 2(b) depicts frequency characteristics of  $I_T$  and  $V_R$  that becomes

$$V_{R}(\omega) = j\omega M I_{T}(\omega) = j\omega M \frac{\tau I_{pp}}{16\pi} \exp\left(-\frac{\omega^{2}\tau^{2}}{16}\right) \exp(-j\omega t_{0}).$$
(2)

 $V_R(\omega)$  has a peak frequency (fundamental)  $f_p=\sqrt{2}/\pi\tau$  and a bandwidth up to  $2f_p$ . In a case of  $\tau=125$  ps, the bandwidth is 7.2 GHz. An actual inductive coupling limits the bandwidth due to parasitics. Layout parameters for the inductive coupling should be designed to maximize M while maintaining the bandwidth.

Fig. 3(a) shows an equivalent circuit for the actual inductive coupling where transmitter and receiver inductors are



Fig. 3. Model of inductive coupling. (a) Equivalent circuit. (b) Frequency characteristics (circuit parameters for data link are shown).

TABLE I
LAYOUT AND CIRCUIT PARAMETERS OF INDUCTIVE COUPLING



for Channel Pitch=30 μm, Communication Distance=15 μm in 0.18 μm CMOS

modeled as parallel resonators. The equivalent circuit gives a transimpedance of the inductive coupling

transimpedance of the inductive coupling 
$$\frac{V_R}{I_T} = \frac{1}{(1 - \omega^2 L_R C_R) + j\omega R_R C_R} \cdot j\omega M \cdot \frac{1}{(1 - \omega^2 L_T C_T) + j\omega R_T C_T}. \tag{3}$$

Due to parasitics, the transmitter and receiver inductors behave as second-order low-pass filters and they limit the bandwidth up to a self-resonance frequency  $f_{SR,T}=1/2\pi\sqrt{L_TC_T}, f_{SR,R}=1/2\pi\sqrt{L_RC_R}$ .

The inductive coupling for the data link is designed for a channel pitch of 30  $\mu$ m and communication distance of 15  $\mu$ m in 0.18  $\mu$ m six-metal CMOS. Four metal layers are used for the inductor and two metal layers are used for transceiver circuits placed under the inductor. Optimized layout and circuit parameters of each transmitter and receiver inductors are summarized in Table I. Diameter of the both inductor is 29.5  $\mu$ m to utilize the given channel pitch of 30  $\mu$ m. A resistance of the transmitter inductor should be low enough to be driven by a high transmit current ( $\sim$ 5 mA). Therefore, the wire should be wider and the number of turns should be less for the transmitter inductor. On the other hand, for the receiver inductor, the number of turns

should be increased and the wire should be narrower to increase a self inductance as well as the mutual inductance until its self-resonance frequency  $f_{SR,R}$  reaches the signal bandwidth of 7.2 GHz. Fig. 3(b) shows frequency characteristics of the inductive coupling for the data link. The inductive coupling exhibits sufficiently wide bandwidth (over 7.2 GHz) and functions as a differentiator.

#### B. Inductive Coupling for Clock Link

The layout and circuit parameters of the inductive coupling for the clock link are also summarized in Table I. Unlike the data link, the clock link transmits a 1 GHz narrowband signal. Therefore, it is designed to have a self-resonance frequency at 1 GHz with a higher Q factor (> 4) so that the transimpedance can be maximized and ambient noise can be effectively cut off.

# C. Bi-Phase Modulation Data Transceiver

Fig. 4 shows the schematic diagram of the data transceiver and simulated waveforms. Bi-phase modulation (BPM) signaling is employed for the data link. At the positive edge of Txclk, a pulse generator in the data transmitter produces negative pulse voltages whose pulsewidth is determined by the delay of the inverter chain. NOR and NORB perform pulse



Fig. 4. BPM data transceiver and simulated waveforms.



Fig. 5. (a) Simulated waveforms and (b) calculated BER in NRZ and BPM signaling.

shaping. A succeeding H-bridge circuit generates positive or negative pulse current,  $I_T$ , according to Txdata. In every clock cycle, the positive pulse is generated when Txdata is high, and the negative pulse is generated when Txdata is low. A sense-amplifier flip-flop in the data receiver samples positive or negative pulse voltage,  $V_R$ , corresponding to the polarity of  $I_T$ , and then it recovers Rxdata.

In the previous work [6], a non-return-to-zero (NRZ) signaling was employed. Fig. 5(a) shows the  $V_R$  signals in the NRZ and BPM signaling. In the NRZ signaling,  $V_R$  signal is not generated when the same data continues. On the other hand, in the BPM signaling the  $V_R$  signal is always generated in every clock cycle. Therefore, noise immunity of the receiver is improved and receiver's sensitivity in the BPM signaling can be maximized while that in the NRZ signaling has to be set low enough to ignore crosstalk. The high sensitivity in the BPM signaling enables lower bit-error rate (BER) with smaller transmis-

sion power. BER in the NRZ and BPM signaling is respectively calculated by

$$BER_{NRZ} = \frac{1}{2} erfc \left( \frac{\tau}{4\sqrt{2}\tau_{j,rms}} \sqrt{\ln \frac{1 - NSR - CSR}{NSR + CSR}} \right) (4)$$

$$BER_{BPM} = \frac{1}{2} \operatorname{erfc} \left( \frac{\tau}{4\sqrt{2}\tau_{j,rms}} \sqrt{\ln \frac{1 - NSR - CSR}{NSR}} \right)$$
(5)

where  $\tau$  is the pulsewidth of  $I_T$  defined in (1),  $\tau_{j,rms}$  is rms jitter in Rxclk, CSR is crosstalk-to-signal ratio and NSR is noise-to-signal ratio. NSR is reduced by increasing transmit pulse energy and then BER is lowered. On the other hand, even if the transmit pulse energy is increased, CSR remains constant because not only signal but also crosstalk is increased. The receiver's sensitivity in the NRZ signaling has to be set low enough to ignore the increased crosstalk. Therefore, larger transmit pulse energy is required in the NRZ signaling to obtain  $BER_{NRZ}$  as



Fig. 6. Wireless clock transceiver and simulated waveforms.



Fig. 7. Four-phase TDMA.

low as  $BER_{BPM}$ . Fig. 5(b) presents calculated  $BER_{NRZ}$  and  $BER_{BPM}$  as a function of the transmit pulse energy. The pulse energy in the BPM signaling is reduced by a factor of 3 for BER of  $10^{-9}$ . For lower BER, the energy reduction becomes more significant. Although switching activity is doubled in the BPM signaling, power dissipation of the data transceiver is finally reduced in the BPM signaling.

#### D. Wireless Clock Transceiver

Fig. 6 presents the wireless clock transceiver. An H-bridge circuit in the transmitter is driven by Txclk with buffers INV, INVB whose fanout is set high enough ( $\sim$ 10) to reduce harmonics in Txclk and generate triangular current  $I_{TC}$ .

Pre-amplifiers in the receiver amplify received voltage  $V_{RC}$ . Succeeding feedback inverter chains recover full-swing Rxclk.

The clock transceiver is an asynchronous circuit like [8]–[13] that consume much power due to static power dissipation. However, by the synchronous clock, high-sensitive yet low-power circuits can be utilized for the data transceiver. Since power dissipation in the data link is dominant in that of the 64ch block, total power dissipation is reduced by employing the synchronous data transceiver.

At the self-resonant frequency, transimpedance of the inductive coupling is quite sensitive to errors in LC component values. By tuning LC, the transimpedance can be maximized and the transmit current can be minimized. However, since power dissipation of the clock link is not dominant in this implementation,



Fig. 8. Block diagram of inductive-coupling transceiver with test circuits.



Fig. 9. Chip microphotographs.

the clock transmitter simply provides extra transmit current to overcome the errors.

# E. Four-Phase TDMA

Four-phase TDMA is utilized for crosstalk reduction. Fig. 7 describes the scheme and simulated waveforms. The phase interpolator generates four-phase clocks that are assigned like a checkerboard pattern in the data transceiver array. Simulated waveforms are shown on the left. When the channel pitch is taken down to 30  $\mu$ m, the crosstalk increases to the same level of the signal. Two-phase TDMA reduces crosstalk to half of the signal, however, it is not low enough for communications with BER lower than  $10^{-13}$ . Four-phase TDMA reduces crosstalk to 10 mV-peak voltage and enables BER lower than  $10^{-13}$ .

#### III. TEST CHIP DESIGN AND EXPERIMENTAL SETUP

Fig. 8 depicts the block diagram of circuits for test. A delay controller, a TDMA controller, a pitch controller, and built-in-

self-test (BIST) circuits are implemented in the 64ch block. Phase timing of Rxclk, is adjusted by the delay controller by UI/128 steps (7.8 ps). The TDMA controller changes number of phases and phase assignment so that the transceiver with four-phase, two-phase or without TDMA can be tested for comparison. The pitch controller selects activated channels to change channel pitch and number of aggregated channels. The BIST circuits are implemented for BER measurement. Pseudo-random binary sequence (PRBS) generators produce  $2^{23}-1$  word pattern for transmitted data. Number of errors in received data is counted in the receiver. Scan chain initializes PRBS generators and outputs measured errors count for BER measurement.

Fig. 9 shows microphotographs of the test chips fabricated in 0.18  $\mu m$  CMOS. The transmitter chip is placed on top of the receiver chip, with both chips in face up. Both chips are polished to 10  $\mu m$  thickness. Communication distance including an adhesive layer is 15  $\mu m$ . The clock transceiver transmits 1 GHz clock by the metal inductor with a diameter of 200  $\mu m$ . The



Fig. 10. Infrared photo of stacked test chips.



Fig. 11. Experimental setup.

clock transceiver is set up for every 64 data transceivers. The data transceiver communicates at 1 Gb/s/channel by the metal inductor with a diameter of 29.5  $\mu$ m. 1024 data transceivers are arranged with a pitch of 30  $\mu$ m. The transmitter and receiver circuits are placed under the metal inductors to save layout area. Because of the compact layout, inter-channel skew in the 64ch block can be suppressed to 11 ps in the clock distribution network. Experiments indicate influence from the transceiver circuits to the inductive channel is negligibly small. Fig. 10 shows infrared photos of the stacked chips. The two chips are aligned by a conventional infrared alignment with alignment patterns in top-metal layer. The measured alignment error is less than 3  $\mu$ m. The misalignment is negligible.

Fig. 11 describes an experimental setup for the stacked test chips. The stacked chips are mounted on a wafer, placed on a probe station without electromagnetic shield, and tested in a laboratory room with no control of temperature, dust and air. A probe card provides connections between the stacked chips and external sources. Power and ground are provided by DC probes. Scan-in data is generated by an external data-timing generator to





Fig. 12. Measured received clock and jitter.

initialize and control the chips. The data-timing generator provides differential 1 GHz clock Txclk to the transmitter chip. The wireless clock transceiver transmits the clock to the receiver chip and it recovers Rxclk. In on-chip BIST circuit, PRBS generators produce  $2^{23} - 1$  word pattern for Txdata and errors in Rxdata are counted. An oscilloscope monitors waveforms of Rxclk and Rxdata. A logic analyzer measures number of errors and calculates BER.

## IV. MEASUREMENT RESULTS

# A. Wireless Clock Transmission

Fig. 12 presents measurement results of wireless clock transmission. The 1 GHz clock is successfully transmitted by the wireless clock transceiver. Rms jitter is 9.5 ps in Rxclk, some of which is caused by 6 ps-rms jitter in Txclk by the external data timing generator. Jitter produced in the clock transceiver



Fig. 13. Snapshot of data waveforms (1 Gb/s,  $2^{23} - 1$  PRBS data).

can be assumed as 7.4 ps (=  $\sqrt{9.5^2 - 6^2}$  ps). The clock transmitter consumes 4 mW and the clock receiver consumes 6 mW from 1.8 V supply.

## B. Single Channel Data Communication

The 1 Gb/s/channel data communication is demonstrated in Fig. 13. The 1 GHz clock is transmitted by the wireless clock transceiver and  $2^{23} - 1$  PRBS generator provides transmitted data. Snapshot of data waveforms is presented on the right. It shows that the both data and clock transceivers operate correctly. Delay time between Txdata and Rxdata is 10 ns, including delay caused by cable and buffers in the experimental setup. By taking them out, it is confirmed that latency between Txdata and Rxdata is 1 clock. Measured BER is lower than  $10^{-14}$  which is as reliable as that of wired interfaces. The data transmitter consumes 2 mW and the data receiver consumes 0.4 mW from 1.8 V supply. Measured timing bathtub curve is shown in Fig. 14. On-chip delay controller sweeps phase timing of Rxdk by 7.8 ps. BER lower than  $10^{-13}$  is examined by the  $2^{23}-1$  PRBS data of 1 Gb/s. Timing margin of 150 ps is obtained. The margin is sufficiently wide so that the timing can be easily adjusted. The edges of the bathtub curve match calculated results in a case where the sampling clock has 7.4 ps-rms jitter. It indicates that the assumption on the sampling jitter is reasonable. In addition, there is no difference in measured timing bathtub curve of a transceiver in a condition where circuits are not placed under the metal inductor. Interference between the transceiver circuits and the metal inductors is negligible. However, it is necessary to consider noise coupling between the transmitter and a blocking field.

#### C. Array Communication

BER dependence on channel pitch and the number of phases in TDMA was measured. The measured results are plotted in Fig. 15. By increasing the number of phases in TDMA, crosstalk is reduced and the channel pitch can be shortened for the same BER. 1024 transceivers arranged with a pitch of 30  $\mu$ m operate



Fig. 14. Measured timing bathtub curve.

at BER lower than  $10^{-13}$  with the four-phase TDMA. As a result, aggregate data bandwidth of 1 Tb/s is achieved with 1 mm<sup>2</sup> area for the data transceivers. Fig. 16 presents measured timing bathtub curve of the center channel in the array where all the surrounding channels are operating. Although the timing margin is reduced to 130 ps due to the crosstalk, it is still wide enough to adjust the sampling timing against inter-channel skew and PVT variations.

# D. Performance Summary and Comparison

Chip performance is summarized in Table II. 1 Tb/s data bandwidth is obtained by 1024 data transceivers arranged with a pitch of 30  $\mu$ m. The 1 GHz clock is also provided by the inductive coupling. The transceiver chip consumes 3 W from 1.8 V supply where 1024ch data transceiver consumes 2.4 W, 16ch clock transceivers and 16 phase interpolators consume 0.6 W. The layout area for the data link is 1 mm² and that for the clock link is 1 mm² in 0.18  $\mu$ m CMOS. Power dissipation is



Fig. 15. Measured BER dependence on channel pitch.



Fig. 16. Measured timing bathtub curve of center channel in channel array.

3 mW/Gb/s that is half of the previous work [6] and area is  $1 \text{ mm}^2/\text{Tb/s}$  that is 40% of the previous work [6].

The chip performance is compared with transceiver chips reported at ISSCC [1], [14]–[27] in Fig. 17. Since 3D-interfaces [1], [4], [6]–[9], [26] communicate close proximity, they have advantages in improving the bandwidth with higher power and area efficiency. The state-of-the-art inductive-coupling transceiver achieves the second highest bandwidth with the second lowest power and the smallest layout area.

# V. CONCLUSION

A 1 Tb/s 3 W inductive-coupling transceiver has been developed. 1 GHz clock is also transmitted by the proposed wireless clock transceiver. BPM signaling improves noise immunity, reducing power dissipation to 3 mW/Gb/s (= 3 pJ/b). Four-phase TDMA reduces crosstalk, decreasing channel pitch to 30  $\mu$ m and layout area to 1 mm<sup>2</sup>. As a result, among transceiver chips

reported at ISSCC, the inductive-coupling transceiver achieved the second highest bandwidth with the second lowest power and the smallest layout area.

#### ACKNOWLEDGMENT

The VLSI chips in this study have been fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo in collaboration with MOSIS and Taiwan Semiconductor Manufacturing Company (TSMC).

### REFERENCES

- [1] T. Ezaki *et al.*, "A 160 Gb/s interface design configuration for multichip LSI," in *IEEE ISSCC Dig. Tech. Papers*, 2004, pp. 140–141.
- [2] J. Burns et al., "Three-dimensional integrated circuits for low-power, high-bandwidth systems on a chip," in *IEEE ISSCC Dig. Tech. Papers*, 2001, pp. 268–269.
- [3] K. Kanda et al., "A 1.27 Gb/s/ch 3 mW/pin wireless superconnect (WSC) interface scheme," in *IEEE ISSCC Dig. Tech. Papers*, 2003, pp. 186–187.
- [4] D. Mizoguchi et al., "A 1.2 Gb/s/pin wireless superconnect based on inductive inter-chip signaling (IIS)," in *IEEE ISSCC Dig. Tech. Papers*, 2004, pp. 142–143.
- [5] N. Miura et al., "Analysis and design of inductive coupling and transceiver circuit for inductive inter-chip wireless superconnect," *IEEE J. Solid-State Circuits*, vol. 40, no. 4, pp. 829–837, Apr. 2005.
- [6] N. Miura et al., "A 195-Gb/s 1.2-W inductive inter-chip wireless superconnect with transmit power control scheme for 3-D-stacked system in a package," *IEEE J. Solid-State Circuits*, vol. 41, no. 1, pp. 23–34, Jan. 2006
- [7] N. Miura et al., "A 1 Tb/s 3 W inductive-coupling transceiver for interchip clock and data link," in *IEEE ISSCC Dig. Tech. Papers*, 2006, pp. 424–425.
- [8] R. Drost et al., "Proximity communication," IEEE J. Solid-State Circuits, vol. 39, no. 9, pp. 1529–1535, Sep. 2004.
- [9] R. Drost et al., "Electronic alignment for proximity communication," in *IEEE ISSCC Dig. Tech. Papers*, 2004, pp. 144–145.
- [10] L. Luo et al., "3 Gb/s AC-coupled chip-to-chip communication using a low-swing pulse receiver," in *IEEE ISSCC Dig. Tech. Papers*, 2005, pp. 522–523.
- [11] A. Iwata et al., "A 3-D integration scheme utilizing wireless interconnections for implementing hyper brains," in *IEEE ISSCC Dig. Tech. Papers*, 2005, pp. 262–263.
- [12] M. Sasaki et al., "A 0.95 mW/1.0 Gbps spiral-inductor based wireless chip-interconnect with asynchronous communication scheme," in Symp. VLSI Circuits Dig. Tech. Papers, 2005, pp. 348–351.

TABLE II
PERFORMANCE SUMMARY AND COMPARISON

| Total Bandwidth             | 1Tb/s (5x of [5])                        |
|-----------------------------|------------------------------------------|
| Number of Data Transceivers | 1024                                     |
| Channel Pitch               | 30μm                                     |
| Clock                       | 1GHz, Inductive Coupling                 |
| Signaling                   | Bi-Phase Modulation + 4-phase TDMA       |
| Power Dissipation           | 3W@1.8V (Tx:2W, Rx:0.4W, Clock:0.6W)     |
| Total Area                  | Data:1mm² (Clock:1mm²)                   |
| Power/Bandwidth             | 3mW/Gb/s (50% of [5])                    |
| Area/Bandwidth              | 1mm²/Tb/s (40% of [5])                   |
| Communication Distance      | 15μm (Chip Thickness:10μm, Adhesive:5μm) |
| Process                     | 0.18µm CMOS                              |

[5] N.Miura (ISSCC'05)



Fig. 17. Performance comparisons with ISSCC transceiver chips in (a) bandwidth, (b) power, (c) layout area.

- [13] J. Xu et al., "2.8 Gb/s inductively coupled interconnect for 3-D ICs," in Symp. VLSI Circuits Dig. Tech. Papers, 2005, pp. 352-355.
- [14] Y. Unekawa et al., "A 5 Gb/s 8 × 8 ATM switch element CMOS LSI supporting five quality-of-service classes with 200 MHz LVDS interface," in IEEE ISSCC Dig. Tech. Papers, 1996, pp. 118-119.
- [15] Y. Ohtomo et al., "A 40 Gb/s 8×8 ATM switch LSI using 0.25 μm CMOS/SIMOX," in IEEE ISSCC Dig. Tech. Papers, 1997, pp. 154-155
- [16] B. Lau et al., "A 2.6 GB/s multi-purpose chip-to-chip interface," in IEEE ISSCC Dig. Tech. Papers, 1998, pp. 162-163.
- [17] T. Takahashi et al., "110 GB/s simultaneous bi-directional transceiver logic synchronized with a system clock," in IEEE ISSCC Dig. Tech. Papers, 1999, pp. 176-177.
- [18] M. Fukaishi et al., "A 20 Gb/s CMOS multi-channel transmitter and receiver chip set for ultrahigh resolution digital display," in IEEE ISSCC Dig. Tech. Papers, 2000, pp. 260-261.
- [19] K. Yang et al., "A scalable 32 Gb/s parallel data transceiver with on-chip timing calibration circuits," in IEEE ISSCC Dig. Tech. Papers, 2000, pp. 258-259.
- [20] T. Tanahashi et al., "A 2 Gb/s 21CH low-latency transceiver circuit for inter-processor communication," in IEEE ISSCC Dig. Tech. Papers, 2001, pp. 60-61.
- [21] R. Nair et al., "A 28.5 GB/s CMOS non-blocking router for terabit/s connectivity between multiple processors and peripheral I/O nodes," in IEEE ISSCC Dig. Tech. Papers, 2001, pp. 224–225.
- [22] P. Landman et al., "A 62 Gb/s backplane interconnect ASIC based on 3.1 Gb/s serial-link technology," in IEEE ISSCC Dig. Tech. Papers, 2002, pp. 52-53.
- [23] K. Tanaka et al., "A 100 Gb/s transceiver with GND-VDD commonmode receiver and flexible multi-channel aligner," in IEEE ISSCC Dig. Tech. Papers, 2002, pp. 264-265.
- [24] G. Paul et al., "A scalable 160 Gb/s switch fabric processor with 320 Gb/s memory bandwidth," in IEEE ISSCC Dig. Tech. Papers, 2004, pp. 410-411.
- [25] K. Chang et al., "Clocking and circuit design for a parallel I/O on a first-generation CELL processor," in IEEE ISSCC Dig. Tech. Papers, 2005, pp. 526-527.
- [26] K. Kumagai et al., "System-in-silicon architecture and its application to H.264/AVC motion estimation for 1080HDTV," in IEEE ISSCC Dig. Tech. Papers, 2006, pp. 430-431.
- [27] J. Yamada et al., "High-speed interconnect for a multiprocessor server using over 1 Tb/s crossbar," in IEEE ISSCC Dig. Tech. Papers, 2006, pp. 108-109.



Since 2002, he has been engaged in research on 3D-stacked inductive inter-chip wireless interface for System in a Package. In 2002, he was with Hitachi Central Research Laboratory to study CAD tools for low-power VLSI circuits. In 2004, he worked as an

Design Award. He is a Fellow of the Japan Society for the Promotion of Science.





using FPGA boards. In 2005, he moved to Renesas Technology Corporation, Tokyo, Japan.



Mari Inoue received the B.S. degree in electrical engineering in 2005 from Keio University, Kanagawa, Japan, where she is currently working toward the M.S. degree.

Since 2004, she has been engaged in a research on 3D-stacked inductive inter-chip wireless interface for System in a Package.



Kiichi Niitsu received the B.S. degree in electrical engineering from Keio University, Yokohama, Japan, in 2006, where he is currently working toward the M.S. degree.

Since 2005, he has been engaged in a research on the 3-D-stacked inductive inter-chip wireless interface for System in a Package.



Yoshihiro Nakagawa was born in Yamagata, Japan, in 1976. He received the B.S., M.S., and Ph.D. degrees in mechanical engineering from Tohoku University, Japan, in 1999, 2001, and 2004, respectively.

He joined the System Device Laboratories, NEC Corporation, Kanagawa, Japan, in 2004, and has been engaged in the research and development of highspeed CMOS data-communication circuits.



Masamoto Tago received the B.E. and M.E. degrees in metallurgical engineering from Tokai University, Japan, in 1988 and 1990, respectively.

He joined NEC Corporation, Kanagawa, Japan, in 1990, and is currently with the Jisso and Production Technologies Research Laboratories. He has been engaged in the research and development of advanced VLSI packaging technology and three-dimensional LSI chip packaging technology.



Muneo Fukaishi (M'99) was born in Nagano, Japan, on June 1967. He received the B.E. degree in applied physics from Waseda University, Japan, in 1991.

After joining NEC Corporation in 1991, he was engaged in the research and development of process and device technologies of GaAs LSIs. Since April 1996, he has been engaged in the research and development of Si LSIs design, especially high-speed data communication LSIs, in the System Device Research Laboratories. From 2002 to 2003, he was a visiting Scholar in the Microsystems Technology Laborato-

ries at Massachusetts Institute of Technology (MIT), MA.

Mr. Fukaishi is a member of the IEEE Solid-State Circuits Society and the Institute of Electronics, Information, and Communication Engineers (IEICE) of Japan. He was awarded the 1998 Conference Paper Award by the IEICE of Japan.



**Takayasu Sakurai** (S'77–M'78–SM'01–F'03) received the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1981.

In 1981, he joined Toshiba Corporation, where he designed CMOS DRAM, SRAM, RISC processors, DSPs, and SoC Solutions. He has worked extensively on interconnect delay and capacitance modeling known as Sakurai model and alpha power-law MOS model. From 1988 through 1990, he was a visiting researcher at the University of California, Berkeley, where he conducted research in the field of VLSI

CAD. Since 1996, he has been a Professor at the University of Tokyo, working on low-power high-speed VLSI, memory design, interconnects, ubiquitous electronics, organic ICs and large-area electronics. He has published more than 350 publications including 70 invited publications and several books and filed more than 100 patents.

Prof. Sakurai served as a conference chair for the Symposium on VLSI Circuits and ICICDT, a vice chair for ASPDAC, a TPC chair for the first A-SSCC, and VLSI symp. and a program committee member for ISSCC, CICC, DAC, ESSCIRC, ICCAD, ISLPED, and other international conferences. He is a recepient of 2005 IEEE ICICDT award, 2004 IEEE ISSCC Takuo Sugano award and 2005 P&I patent of the year award. He is a plenary speaker for the 2003 ISSCC and 2004 ESSCIRC. He is consulting to US startup companies. He is an IEEE Fellow, a STARC Fellow, an elected AdCom member for the IEEE Solid-State Circuits Society and an IEEE CAS and SSCS distinguished lecturer.



**Tadahiro Kuroda** (M'88–SM'00–F'06) received the Ph.D. degree in electrical engineering from the University of Tokyo, Tokyo, Japan, in 1999.

In 1982, he joined Toshiba Corporation, where he designed CMOS SRAMs, gate arrays and standard cells. From 1988 to 1990, he was a Visiting Scholar with the University of California, Berkeley, where he conducted research in the field of VLSI CAD. In 1990, he was back to Toshiba, and engaged in the research and development of BiCMOS ASICs, ECL gate arrays, high-speed CMOS LSIs for

telecommunications, and low-power CMOS LSIs for multimedia and mobile applications. In 2000, he moved to Keio University, Yokohama, Japan, where he has been a Professor since 2002. He has been a Visiting Professor at Hiroshima University, Japan, and the University of California, Berkeley. His research interests include low-power, high-speed CMOS design for wireless and wireline communications, human computer interactions, and ubiquitous electronics. He has published more than 200 technical publications, including 50 invited papers, and 18 books/chapters, and has filed more than 100 patents.

Dr. Kuroda served as the General Chairman for the Symposium on VLSI Circuits, the Vice Chairman for ASP-DAC, sub-committee chairs for ICCAD, A-SSCC, and SSDM, and program committee members for the Symposium on VLSI Circuits, CICC, DAC, ASP-DAC, ISLPED, SSDM, ISQED, and other international conferences. He is a recipient of the 2005 IEEE System LSI Award, the 2005 P&I Patent of the Year Award, and the 2006 LSI IP Design Award. He is an IEEE Fellow.